scikit-learn: RepeatedStratifiedKFold
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RepeatedStratifiedKFold.html
Randomized CV splitters may return different results for each call of split. You can make the results identical by setting random_state to an integer.
「ランダム化されたCV分割器はsplitの呼び出しのたびに異なる結果を返すかもしれない」
「random_state引数に整数を設定することで、結果を全く同じにできる」
理解:繰り返し1回1回の分割は異なるが再現させられる(続く例も参照)
code:example.py
>> import numpy as np
>> from sklearn.model_selection import RepeatedStratifiedKFold
>> X = np.array(1, 2], 3, 4, 1, 2, [3, 4)
>> y = np.array(0, 0, 1, 1)
>> rskf = RepeatedStratifiedKFold(n_splits=2, n_repeats=2, random_state=36851234)
>> for train_index, test_index in rskf.split(X, y):
... print("TRAIN:", train_index, "TEST:", test_index)
...
TRAIN: 1 2 TEST: 0 3 # 0
TRAIN: 0 3 TEST: 1 2
TRAIN: 1 3 TEST: 0 2 # 1 (0とは違う分割)
TRAIN: 0 2 TEST: 1 3
>>
>> for train_index, test_index in rskf.split(X, y):
... print("TRAIN:", train_index, "TEST:", test_index)
...
TRAIN: 1 2 TEST: 0 3 # random_stateにより、再現している
TRAIN: 0 3 TEST: 1 2
TRAIN: 1 3 TEST: 0 2
TRAIN: 0 2 TEST: 1 3